Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language

نویسندگان

چکیده

Applying large scale pre-trained image-language model to video-language tasks has recently become a trend, which brings two challenges. One is how effectively transfer knowledge from static images dynamic videos, and the other deal with prohibitive cost of fully fine-tuning due growing size. Existing works that attempt realize parameter-efficient learning can be categorized into types: 1) appending sequence temporal transformer blocks after 2D Vision Transformer (ViT), 2) inserting block ViT architecture. While these types methods only require newly added components, there are still many parameters update, they validated on single task. In this work, based our analysis core ideas different modeling components in existing approaches, we propose token mixing strategy enable cross-frame interactions, enables transferring through selecting key set value input video samples. As does not addition any or modules, directly partially fine-tune achieve parameter-efficiency. We carry out extensive experiments compare proposed method methods. Our outperforms both understanding generation tasks. Besides, achieves new records multiple The code available at https://github.com/yuqi657/video_language_model.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Teaching approaches to Computer Assisted Language Learning

Computers have been used for language teaching ever since the 1960's.Learning a second language is a challenging endeavor, and, for decades now, proponents of computer assisted language learning (CALL) have declared that help is on the horison. We investigate the suitability of deploying speech technology in computer based systems that can be used to teach foreign language skills. In this case,...

متن کامل

language transfer (from television to internet) in political communications

the aim of this paper is to show that modern media are not totally independent of the older media and that there is, specifically, a kind of language transfer from older media such as television to the modern ones. although internet users enjoy relatively more freedom in the virtual reality public domains compared with real public domains, their political communications in negotiation rooms are...

متن کامل

willingness to communicate in the iranian context: language learning orientation and social support

why some learners are willing to communicate in english, concurrently others are not, has been an intensive investigation in l2 education. willingness to communicate (wtc) proposed as initiating to communicate while given a choice has recently played a crucial role in l2 learning. it was hypothesized that wtc would be associated with language learning orientations (llos) as well as social suppo...

Foreign Language Anxiety and the EFL Learners’ Intention to Continue their English Language Learning

Anxiety undoubtedly plays an influential role in the experience of foreign language learning. This affective factor has attracted lots of researchers and has been the subject pool of scholarly research worldwide. However, research on the effect of demographic variables on foreign language anxiety (FLA) and the effect of FLA on the learners’ intention to continue their English language education...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i2.25267